Use page tracking for snapshot and restore #683

simongdavies · 2025-07-01T20:40:44Z

This PR makes it so sandbox Snapshots no longer contain a full copy of memory at the time of snapshot. Instead, only pages that have been dirtied since the last snapshot are included. This improves performance due to less memory copying when restoring a snapshot, and also (usually) reduces memory consumption due to not having to keep full copies around.

The PR first introduces tracking of dirty pages. It's done on two levels:

In the host manually when writing to shared memory.
In the vm by the hypervisor.

The union is taken of these two sets to determine which memory has been dirtied since last snapshot.

Since snapshots no longer contain a full copy of memory, restoring memory from a snapshot is a bit more complicated.
The algorithm this PR uses goes something like this:

Function: Restore memory to snapshot N given that sandbox was most recently at snapshot M (N could be <, >, or == M)

Step 1: Determine pages to restore
	Start with all currently dirty pages
	If N == M: Only restore currently dirty pages
	If N < M (rollback): Add all pages that exist in snapshots M, M-1, ..., N+1 (walking backwards from M to N, exclusive of N)
	If N > M (fast-forward): Add all pages that exist in snapshots N, N-1, ..., M+1 (walking backwards from N to M, exclusive of M)
	If no previous snapshot (M = None): Add all pages that exist in N and all its ancestors

Step 2: Restore each page
	For each page p in the combined set:
		Search for page p starting from snapshot N and walking backwards through its ancestors (N, N-1, N-2, ...)
		If found in any snapshot: Restore from that snapshot's data
		If not found in any snapshot: Zero the page (return to initial state)

MSHV3 microbenchmarks vs main branch (on a VM)

azureuser@aure-linux3-dev-vm [ ~/hyperlight ]$ cargo bench --profile release -- --load-baseline new_pr_mshv --save-baseline new_main_mshv
    Finished `release` profile [optimized] target(s) in 0.16s
     Running benches/benchmarks.rs (target/release/deps/benchmarks-db619ff9b478f950)
Gnuplot not found, using plotters backend
guest_functions/guest_call
                        time:   [67.542 µs 68.089 µs 68.526 µs]
                        change: [−1.2082% +0.1264% +1.4700%] (p = 0.86 > 0.05)
                        No change in performance detected.
guest_functions/guest_restore
                        time:   [20.162 µs 20.293 µs 20.439 µs]
                        change: [+0.1955% +2.1241% +4.0820%] (p = 0.03 < 0.05)
                        Change within noise threshold.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
guest_functions/guest_call_with_restore
                        time:   [157.94 µs 158.43 µs 158.92 µs]
                        change: [+56.387% +58.390% +59.707%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild
guest_functions/guest_call_with_call_to_host_function
                        time:   [310.00 µs 311.67 µs 313.40 µs]
                        change: [−3.4428% −2.8296% −2.2254%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe

sandboxes/create_uninitialized_sandbox
                        time:   [386.31 µs 387.47 µs 388.68 µs]
                        change: [+7.0542% +7.5942% +8.2346%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
  1 (1.00%) low mild
  4 (4.00%) high mild
  4 (4.00%) high severe
sandboxes/create_uninitialized_sandbox_and_drop
                        time:   [448.14 µs 448.95 µs 449.91 µs]
                        change: [+6.6067% +6.8878% +7.1789%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 13 outliers among 100 measurements (13.00%)
  9 (9.00%) high mild
  4 (4.00%) high severe
sandboxes/create_sandbox
                        time:   [1.3778 ms 1.3898 ms 1.4019 ms]
                        change: [+4.1825% +5.9375% +7.7199%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
sandboxes/create_sandbox_and_drop
                        time:   [33.784 ms 34.339 ms 34.926 ms]
                        change: [−1.6906% +0.5839% +3.0143%] (p = 0.63 > 0.05)
                        No change in performance detected.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild

sandbox_heap_sizes/create_sandbox_default_heap
                        time:   [1.4358 ms 1.4487 ms 1.4623 ms]
                        change: [+6.5125% +7.9527% +9.3471%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild
sandbox_heap_sizes/create_sandbox_50mb_heap
                        time:   [32.056 ms 32.142 ms 32.238 ms]
                        change: [+4.5466% +4.9531% +5.3459%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
  6 (6.00%) high mild
  2 (2.00%) high severe
sandbox_heap_sizes/create_sandbox_500mb_heap
                        time:   [299.92 ms 300.96 ms 302.29 ms]
                        change: [+1.9729% +2.4132% +2.9360%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
  1 (1.00%) high mild
  4 (4.00%) high severe
sandbox_heap_sizes/create_sandbox_995mb_heap
                        time:   [590.06 ms 590.81 ms 591.64 ms]
                        change: [+1.0550% +1.3629% +1.5978%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) high mild
  1 (1.00%) high severe

2_large_parameters/guest_call_restore_5mb_params
                        time:   [52.219 ms 52.306 ms 52.396 ms]
                        change: [−31.848% −30.444% −29.144%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  4 (4.00%) high mild
2_large_parameters/guest_call_restore_20mb_params
                        time:   [211.97 ms 212.31 ms 212.71 ms]
                        change: [−13.969% −11.336% −8.5797%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) high mild
  1 (1.00%) high severe
2_large_parameters/guest_call_restore_60mb_params
                        time:   [601.85 ms 605.99 ms 610.98 ms]
                        change: [+24.901% +26.688% +28.462%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
  8 (8.00%) high severe

guest_call_restore_heap_sizes/guest_call_restore_default_heap
                        time:   [160.57 µs 162.40 µs 164.57 µs]
                        change: [+70.745% +72.839% +75.092%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
  4 (4.00%) high mild
  2 (2.00%) high severe
guest_call_restore_heap_sizes/guest_call_restore_50mb_heap
                        time:   [417.92 µs 419.58 µs 421.69 µs]
                        change: [−80.442% −80.263% −80.071%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe
guest_call_restore_heap_sizes/guest_call_restore_500mb_heap
                        time:   [2.7581 ms 2.7617 ms 2.7662 ms]
                        change: [−87.626% −87.576% −87.528%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe
guest_call_restore_heap_sizes/guest_call_restore_995mb_heap
                        time:   [5.2304 ms 5.2474 ms 5.2754 ms]
                        change: [−90.122% −89.824% −89.518%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe

snapshot/default_heap   time:   [22.415 µs 22.614 µs 22.880 µs]
                        change: [+8.7437% +10.398% +12.225%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe
snapshot/50_mb_heap     time:   [273.43 µs 273.93 µs 274.57 µs]
                        change: [−98.709% −98.704% −98.699%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 18 outliers among 100 measurements (18.00%)
  1 (1.00%) high mild
  17 (17.00%) high severe
snapshot/500_mb_heap    time:   [2.5732 ms 2.5797 ms 2.5891 ms]
                        change: [−98.676% −98.664% −98.653%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) high mild
  2 (2.00%) high severe
snapshot/995_mb_heap    time:   [5.0826 ms 5.0971 ms 5.1208 ms]
                        change: [−99.520% −99.512% −99.503%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe

KVM microbenchmarks vs main branch (on a wsl)

ludde@ludvig-thinkpad:~/hyperlight-new$ cargo bench --profile release -- --load-baseline new_pr --baseline new_main
    Finished `release` profile [optimized] target(s) in 0.54s
     Running benches/benchmarks.rs (target/release/deps/benchmarks-253a48cada671a19)
Gnuplot not found, using plotters backend
guest_functions/guest_call
                        time:   [13.908 µs 14.046 µs 14.200 µs]
                        change: [−23.042% −19.582% −15.931%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) high mild
  5 (5.00%) high severe
guest_functions/guest_restore
                        time:   [9.6810 µs 9.8001 µs 9.9421 µs]
                        change: [−42.284% −40.142% −38.049%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
  5 (5.00%) high mild
  7 (7.00%) high severe
guest_functions/guest_call_with_restore
                        time:   [69.005 µs 70.373 µs 72.045 µs]
                        change: [+44.369% +58.109% +72.305%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
  1 (1.00%) high mild
  7 (7.00%) high severe
guest_functions/guest_call_with_call_to_host_function
                        time:   [377.03 µs 382.01 µs 388.41 µs]
                        change: [−11.386% −8.8335% −6.2630%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  3 (3.00%) high mild
  7 (7.00%) high severe

sandboxes/create_uninitialized_sandbox
                        time:   [226.99 µs 231.31 µs 236.45 µs]
                        change: [−3.1739% −0.3462% +2.5131%] (p = 0.81 > 0.05)
                        No change in performance detected.
Found 9 outliers among 100 measurements (9.00%)
  7 (7.00%) high mild
  2 (2.00%) high severe
sandboxes/create_uninitialized_sandbox_and_drop
                        time:   [227.20 µs 230.25 µs 233.84 µs]
                        change: [+5.6581% +8.5039% +11.149%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 12 outliers among 100 measurements (12.00%)
  3 (3.00%) high mild
  9 (9.00%) high severe
sandboxes/create_sandbox
                        time:   [1.0948 ms 1.1111 ms 1.1297 ms]
                        change: [−7.9543% −3.7963% +0.8833%] (p = 0.10 > 0.05)
                        No change in performance detected.
Found 10 outliers among 100 measurements (10.00%)
  5 (5.00%) high mild
  5 (5.00%) high severe
sandboxes/create_sandbox_and_drop
                        time:   [1.9554 ms 1.9835 ms 2.0152 ms]
                        change: [−4.1234% −1.1564% +1.6366%] (p = 0.44 > 0.05)
                        No change in performance detected.
Found 5 outliers among 100 measurements (5.00%)
  4 (4.00%) high mild
  1 (1.00%) high severe

sandbox_heap_sizes/create_sandbox_default_heap
                        time:   [1.2194 ms 1.2454 ms 1.2765 ms]
                        change: [+9.0826% +14.088% +19.624%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe
sandbox_heap_sizes/create_sandbox_50mb_heap
                        time:   [1.5619 ms 1.5915 ms 1.6229 ms]
                        change: [+2.9032% +5.6611% +8.6451%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
  4 (4.00%) high mild
sandbox_heap_sizes/create_sandbox_500mb_heap
                        time:   [2.6730 ms 2.7462 ms 2.8405 ms]
                        change: [+12.363% +16.198% +20.804%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
  6 (6.00%) high mild
  2 (2.00%) high severe
sandbox_heap_sizes/create_sandbox_995mb_heap
                        time:   [3.7462 ms 3.8141 ms 3.8888 ms]
                        change: [+4.9046% +7.3384% +9.9302%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) high mild
  3 (3.00%) high severe

2_large_parameters/guest_call_restore_5mb_params
                        time:   [91.571 ms 92.535 ms 93.557 ms]
                        change: [−51.474% −50.725% −49.930%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  6 (6.00%) high mild
2_large_parameters/guest_call_restore_20mb_params
                        time:   [356.37 ms 360.05 ms 363.86 ms]
                        change: [−11.216% −9.6227% −8.0211%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild
2_large_parameters/guest_call_restore_60mb_params
                        time:   [1.0638 s 1.0767 s 1.0901 s]
                        change: [+13.741% +15.471% +17.179%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

guest_call_restore_heap_sizes/guest_call_restore_default_heap
                        time:   [79.753 µs 81.480 µs 83.377 µs]
                        change: [+78.458% +86.354% +93.528%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
  6 (6.00%) high mild
guest_call_restore_heap_sizes/guest_call_restore_50mb_heap
                        time:   [86.856 µs 88.864 µs 91.050 µs]
                        change: [−98.410% −98.324% −98.241%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe
guest_call_restore_heap_sizes/guest_call_restore_500mb_heap
                        time:   [99.096 µs 101.65 µs 104.43 µs]
                        change: [−99.818% −99.813% −99.808%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild
guest_call_restore_heap_sizes/guest_call_restore_995mb_heap
                        time:   [112.63 µs 115.11 µs 118.16 µs]
                        change: [−99.907% −99.904% −99.900%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  4 (4.00%) high mild
  2 (2.00%) high severe

snapshot/default_heap   time:   [10.766 µs 10.890 µs 11.032 µs]
                        change: [−53.139% −51.740% −50.439%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
  9 (9.00%) high mild
  3 (3.00%) high severe
snapshot/50_mb_heap     time:   [12.270 µs 12.441 µs 12.641 µs]
                        change: [−99.866% −99.863% −99.860%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  5 (5.00%) high mild
  3 (3.00%) high severe
snapshot/500_mb_heap    time:   [21.229 µs 21.496 µs 21.808 µs]
                        change: [−99.978% −99.977% −99.976%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
  9 (9.00%) high mild
  3 (3.00%) high severe
snapshot/995_mb_heap    time:   [31.801 µs 32.432 µs 33.173 µs]
                        change: [−99.987% −99.987% −99.986%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  4 (4.00%) high mild
  6 (6.00%) high severe

OLD PR Description

This pull request introduces snapshotting and restoring of sandbox state using dirty page tracking rather than copying the entire memory state each time, in many/most scenarios where sandboxes have larger than default amounts of memory this results in better perforamnce and reduced memory usage.

However, due to inefficiencies in the way dirty page tracking works on mshv and the fact that the implementation of tracking is not done for Windows this is not universally true.

Included below are screenshots showing the difference in time taken for some benchmarks with and without dirty page tracking enabled, along with some explanations of the differences seen.

The changes comprise:

Host Shared Memory Dirty Page tracking

Added a custom signal handler for SIGSEGV to support dirty page tracking for host memory mapped into a VM. Updated documentation to reflect this change and added debugging instructions for handling SIGSEGV in GDB and LLDB.

VM Dirty page tracking

Enabled dirty page tracking for mshv and KVM drivers to track changes made to memory in the guest

Snapshot Management

Added a new snapshot manager module to create, manage and restore memory snapshots.

Benchmarking Enhancements

Refactored the guest_call_benchmark_large_param function to support benchmarks for multiple parameter sizes and added a new sandbox_heap_size_benchmark function to measure sandbox creation performance with varying heap sizes.
Introduced guest_call_heap_size_benchmark to evaluate guest function call performance with different heap sizes.

Performance Changes

KVM Before

KVM After

KVM shows massive positive changes in performance when creating large sandboxes and calling functions in sandboxes with large memory configurations , although not measured the amount of memory consumed should have reduced considerably. In the scenario where large parameters are passed to the sandbox there is a degradation of performance, this has not been investigated yet, it may be due to the page based mechanism for saving/restoring data being more expensive than copying and restoring all the data. Other work to make large parameter passing more efficient will likely have a large positive impact here as well.

There is some regression in performance for small/default sandbox sizes which is more than likely caused by the overhead of tracking and building/restoring page based snapshots for small memory vs. copying and restoring the entire memory.

mshv2 Before

mshv2 After

mshv3 Before

mshv3 After

Both mshv2 and mshv3 show similar patterns to KVM in that the larger sandbox sizes show large improvements, the default/small sandboxes show regressions and the large parameter sizes show regressions, possibly for the same reason that the KVM one does , again no investigation done here yet.

The biggest difference between KVM and mshv is for two reasons, first when enabling dirty page tracking on mshv the first call to get dirty pages results in a returned bitmap showing all pages dirty, since this would cause us to snapshot all memory as a baseline after enabling dirty page tracking this PR gets the dirty pages immediately and then discards the result. This approach means that we have to make 2 calls to get dirty pages when we really only need one. The impact of this is quite large, especially with larger memory configurations since the call to get dirty pages seems to be O(n) where n is the number of pages in the memory configuration (regardless of if the pages are dirty or not), for a 950mb VM I observed ~1.4ms response for this call, however, this approach with larger sandboxes is still much quicker than copying all the memory. Fixing these issues in mshv will probably bring the performance much closer to KVM.

Windows 2025 Before

Windows 2025 After

Windows is largely either the same performance or has regressed, this is because the Windows implementation has not been done yet, at the moment each time dirt pages are requested Windows reports that all pages are dirty and the snapshots/restores are done on that basis, there is some overhead of this approach especially when restoring

jprendes

I haven't finished reviewing it yet, but these are my comments so far

jprendes · 2025-07-17T14:40:48Z

src/hyperlight_host/src/hypervisor/hyperv_linux.rs

+            if base_pfn == u64::MAX {
+                base_pfn = mshv_region.guest_pfn;
+            }


This is effectively obtaining the guest_pfn of the first region.
Is there a guarantee on the order of the regions?
Why does this correspond to the guest page number?

Should this maybe be the smallest guest_pfn? (although I don't understand why 😅)

base_pfn = base_pfn.min(mshv_region.guest_pfn);

No, there is no guarantee of the order currently, but it happens to be true that they are in order and are contiguous because of the way we build them. It's probably something we want to enforce at some point

Don't we use a constant else where to specify the base address that we map to? In which case can we not compute the PFN based on the constant value (I think it may actually be zero) but the point is that if we base it on the constant and that ever changes or gets removed this code that depends on that will be impacted.

The first guest region's guest address is 0 (BASE_ADDRESS). The page frame number is obtained by right-shifting the guest address by 12. Note 1 << 12 == page size. I think it is better to calculate it directly like this, rather than assume that it is derived from BASE_ADDRESS, which may or may not be the case.

src/hyperlight_host/src/hypervisor/hyperv_linux.rs

src/hyperlight_host/src/hypervisor/kvm.rs

jprendes

Some more comments.

src/hyperlight_host/src/mem/shared_mem.rs

jprendes

Looks great!
Thanks for working on this!

I left some more comments, but in general it's great.

jprendes · 2025-07-18T10:01:34Z

src/hyperlight_host/src/hypervisor/kvm.rs

+                    }
+                }
+            };
+            for page_idx in bit_index_iterator(&bitmap) {


I think it would be nice to use a dependency like bitvec.
Then we can do something like this:

let mut sandbox_dirty_pages = BitVec::new(); for (i, mem_region) = self.mem_regions.iter().take(n_contiguos).enumerate() { let num_pages = mem_region.guest_region.len() / PAGE_SIZE_USIZE; let bitmap = ...; let mut bitmap = BitVec::from_vec(bitmap); bitmap.truncate(num_pages); dirty.extend_from_bitslice(bitmap.as_bitslice()); } Ok(sandbox_dirty_pages)

src/hyperlight_host/src/mem/bitmap.rs

src/hyperlight_host/src/mem/shared_mem.rs

src/hyperlight_host/src/sandbox/initialized_multi_use.rs

src/hyperlight_host/src/sandbox/snapshot.rs

src/hyperlight_host/src/mem/shared_mem_snapshot.rs

jprendes · 2025-07-18T16:11:46Z

src/hyperlight_host/src/mem/shared_mem_snapshot.rs

+        current_dirty_pages: &[u64],
+        most_recent_snapshot: &Option<Arc<SharedMemorySnapshot>>,
+    ) -> Result<u64> {
+        if self.sandbox_size() != shared_mem.mem_size() {


hmm, I think we need to check more than just the sandbox size. It has to have the same layout.
What happens if we have a sandbox with 1M heap, 1k io, and 4k stack, and restore on it a sandbox with 1M stack, 4k io, and 1M heap? the main memory size would be the same, but then we would use the wrong Layout, right?

I think this was instroduced by the new snapshot API, we should create an issue to fix it.

Ah, I think the fact that we don't support restoring out of branch is the key that avoid this issue for now.

Actually you're right, this case is possible if we're restoring a brand new sandbox, with no previous snapshot, to a given snapshot.

Would it be an option not supporting that case, just like we do with out of branch snapshots?
Or would things fundamentally break? like... not being able to restore the first ever snapshot

If it's an option, I'm happy for us to create an issue and defer fixing it.

src/hyperlight_host/src/mem/shared_mem_snapshot.rs

jprendes · 2025-07-18T20:44:17Z

src/hyperlight_host/src/hypervisor/hyperv_linux.rs

    mem_regions: Vec<MemoryRegion>,
-    n_initial_regions: usize,
+    // Size of the (contioous) sandbox mem_regions


Suggested change

// Size of the (contioous) sandbox mem_regions

// Size of the (contiguous) sandbox mem_regions

Should this be contiguous not continuous?

Fixed the comment :-)

…parameters Signed-off-by: Ludvig Liljenberg <[email protected]>

…ctions. Signed-off-by: Ludvig Liljenberg <[email protected]>

…ges in the host Signed-off-by: Ludvig Liljenberg <[email protected]>

Signed-off-by: Ludvig Liljenberg <[email protected]>

simongdavies · 2025-07-22T14:50:05Z

src/hyperlight_host/benches/benchmarks.rs

@@ -138,9 +214,98 @@ fn sandbox_benchmark(c: &mut Criterion) {
    group.finish();
 }

+// Sandbox creation with different heap sizes


I think we should reorientate the benchmarks, we should have a set of tests that we perform on sandboxes of differing sizes , for each sized sandbox we should:
measure the create time
measure the drop time
have guest calls with a varying number of parameters that does not reset the sandbox
have guest calls with a varying number of parameters that does reset the sandbox.
repeat the above where there are 1-n host function calls made by the guest function
measure the cost of snapshotting when modifying varying proportions of the sandbox memory
measure the cost of restoring when modifying varying proportions of the sandbox memory.
measure the cost of calling with the largest parameters that sized sandbox can support.

Filed #722 for this

simongdavies · 2025-07-22T14:51:19Z

src/hyperlight_host/src/hypervisor/hyperv_linux.rs

    mem_regions: Vec<MemoryRegion>,
-    n_initial_regions: usize,
+    // Size of the (contioous) sandbox mem_regions


Should this be contiguous not continuous?

simongdavies · 2025-07-22T14:54:07Z

src/hyperlight_host/src/hypervisor/hyperv_linux.rs

+            if base_pfn == u64::MAX {
+                base_pfn = mshv_region.guest_pfn;
+            }


Don't we use a constant else where to specify the base address that we map to? In which case can we not compute the PFN based on the constant value (I think it may actually be zero) but the point is that if we base it on the constant and that ever changes or gets removed this code that depends on that will be impacted.

simongdavies · 2025-07-22T14:56:31Z

src/hyperlight_host/src/hypervisor/hyperv_linux.rs

+        vm_fd.get_dirty_log(base_pfn, total_size, CLEAR_DIRTY_BIT_FLAG)?;
+        #[cfg(mshv3)]
+        vm_fd.get_dirty_log(base_pfn, total_size, MSHV_GPAP_ACCESS_OP_CLEAR as u8)?;
+


Having looked at the perf here I wonder if we should at least measure what it would look like to track dirty pages in the guest, I think this should be reasonably doable now that the foundation has been done in the guest.

@simongdavies I am already working on (a superset of)) this

In the meantime, I wonder if a useful perf optimization is a handle to disable dirty page tracking because we expect that this sandbox will only ever be restored to a totally different one. Not sure if that would be useful ATM, but depending on how exactly we do snapshotting for AFD it could make sense.

@simongdavies I am already working on (a superset of)) this

To clarify: in order to achieve our security properties (i.e. an untrusted guest cannot corrupt the execution of or learn information from another guest, even one restored into the same sandbox), we can't trust the guest for dirty page tracking unless the guest is also managing copy-on-write with a hypervisor-enforced readonly underlying store. Therefore, I am looking at making the latter happen.

simongdavies · 2025-07-22T14:57:49Z

src/hyperlight_host/src/hypervisor/hyperv_linux.rs

@@ -885,6 +914,42 @@ impl Hypervisor for HypervLinuxDriver {
        self.interrupt_handle.clone()
    }

+    // TODO: Implement getting additional host-mapped dirty pages.


What does this TODO mean?

I put this here so we won't forget about it once additionally mmaped memory is not read-only, and we potentially want to be able to reset it as well

I would suggest this comment be made a little bit more explicit about e.g. "TODO: When regions of memory other than the main guest memory can be writable, ensure we don't miss any dirty pages in them" or something like that?

simongdavies · 2025-07-22T15:18:01Z

src/hyperlight_host/src/hypervisor/kvm.rs

@@ -750,6 +757,47 @@ impl Hypervisor for KVMDriver {
        self.interrupt_handle.clone()
    }

+    // TODO: Implement getting additional host-mapped dirty pages.


What does this TODO mean?

I put this here so we won't forget about it once additionally mmaped memory is not read-only, and we potentially want to be able to reset it as well

simongdavies · 2025-07-22T15:18:49Z

src/hyperlight_host/src/hypervisor/mod.rs

+    /// Get dirty pages as a bitmap (Vec<u64>).
+    /// Each bit in a u64 represents a page.
+    /// This also clears the bitflags, marking the pages as non-dirty.
+    /// TODO: Implement getting additional host-mapped dirty pages.


If this is about the additonal mmaped regions then I think at the moment these are read only so what is needed is metadata to recreate them?

I put this here so we won't forget about it once additionally mmaped memory is not read-only, and we potentially want to be able to reset it as well

simongdavies · 2025-07-22T15:28:36Z

src/hyperlight_host/src/mem/shared_mem.rs

+    ///
+    /// Additionally, writes to the returned slice will not mark pages as dirty.
+    /// User must call `mark_pages_dirty` manually to mark pages as dirty.
+    fn as_mut_slice(&mut self) -> &mut [u8] {


Is it possible to get rid of this? This is one of the reasons that I preferred the abstraction that signal handling gave, I think that this has the potential for creating difficult to find bugs?

No, most other methods on SharedMemory use this to write memory. At least it's private now

Maybe mark it as unsafe since there is an internal invariant that it enforces, to make callers read the docs, etc? Not sure if it is strictly unsafe in the Rust sense, but I suppose if you wrote other code that depended on the snapshot behaviour having what it ought to you could perhaps construct something.

Or, replace it with a version that takes offset/len and does the dirty marking itself... but that's probably not super ideal? Idk.

simongdavies · 2025-07-22T16:16:20Z

src/hyperlight_host/src/mem/shared_mem.rs

+    /// # Safety
+    /// This function is unsafe because it does not mark the pages as dirty.
+    /// Only use this if you are certain that the pages do not need to be marked as dirty.
+    #[instrument(err(Debug), skip_all, parent = Span::current(), level= "Trace")]


What circumstances is this used under?

We need these two for restoring a snapshot. Restoring a sandbox to a snapshotted state should not itself dirty the pages

simongdavies · 2025-07-22T16:19:06Z

src/hyperlight_host/src/mem/bitmap.rs

+// Prints the dirty bitmap in a human-readable format, coloring each page according to its region
+// NOTE: Might need to be updated if the memory layout changes
+#[allow(dead_code)]
+pub(crate) fn print_dirty_bitmap(bitmap: &[u64], layout: &SandboxMemoryLayout) {


Any reason we cant have this as a Debug trait impl?

We cannot implement Debug on Vec or slice. Would need a wrapper type. Also this takes a &SandboxMemoryLayout

simongdavies added the kind/enhancement For PRs adding features, improving functionality, docs, tests, etc. label Jul 1, 2025

simongdavies linked an issue Jul 1, 2025 that may be closed by this pull request

Improve performance when creating "larger" Sanboxes #642

Open

simongdavies force-pushed the update-snapshot-and-restore branch 5 times, most recently from 7e85120 to 014eab3 Compare July 3, 2025 15:48

This was referenced Jul 3, 2025

Implement Dirty Page Tracking on Windows #688

Open

Create repro for mshv get dirty page issues #689

Open

Investigate why page tracking changes make calling functions with large parameters slower than previous implementation #690

Open

simongdavies changed the title ~~WIP: Use page tracking for snapshot and restore~~ Use page tracking for snapshot and restore Jul 3, 2025

simongdavies force-pushed the update-snapshot-and-restore branch from 014eab3 to 56fd983 Compare July 3, 2025 16:43

This comment was marked as outdated.

Sign in to view

ludfjig force-pushed the update-snapshot-and-restore branch 16 times, most recently from 04995b0 to 894f5fe Compare July 14, 2025 23:48

ludfjig force-pushed the update-snapshot-and-restore branch from 0808c97 to ed8a9d2 Compare July 17, 2025 05:04

ludfjig force-pushed the update-snapshot-and-restore branch 5 times, most recently from 456d533 to 58418bb Compare July 17, 2025 20:40

jprendes reviewed Jul 17, 2025

View reviewed changes

src/hyperlight_host/src/mem/shared_mem.rs Outdated Show resolved Hide resolved

src/hyperlight_host/src/mem/shared_mem.rs Outdated Show resolved Hide resolved

jprendes reviewed Jul 18, 2025

View reviewed changes

ludfjig force-pushed the update-snapshot-and-restore branch 3 times, most recently from 28b4d42 to 6a5458a Compare July 18, 2025 19:32

ludfjig marked this pull request as ready for review July 18, 2025 20:39

ludfjig requested review from danbugs, dblnz, devigned, syntactically and marosset as code owners July 18, 2025 20:39

jprendes reviewed Jul 18, 2025

View reviewed changes

ludfjig force-pushed the update-snapshot-and-restore branch from 6a5458a to 6f17e8b Compare July 18, 2025 21:54

ludfjig added 6 commits July 21, 2025 14:18

Add new benchmarks for varying sizes of sandboxes and guest function …

744cbd8

…parameters Signed-off-by: Ludvig Liljenberg <[email protected]>

Add API for getting VM's dirty pages, and add some bitmap utility fun…

722b7c9

…ctions. Signed-off-by: Ludvig Liljenberg <[email protected]>

Don't use shared_mem.as_mut_slice() since it doesn't dirty touched pa…

49a350a

…ges in the host Signed-off-by: Ludvig Liljenberg <[email protected]>

Track dirty pages from host side

36c495f

Signed-off-by: Ludvig Liljenberg <[email protected]>

Add debugging function for visualizing dirty pages in a specific layout

d9990c0

Signed-off-by: Ludvig Liljenberg <[email protected]>

Snapshot only contains pages dirties since last snapshot was taken

06e3edb

Signed-off-by: Ludvig Liljenberg <[email protected]>

ludfjig force-pushed the update-snapshot-and-restore branch from 6f17e8b to 06e3edb Compare July 21, 2025 21:33

simongdavies commented Jul 22, 2025

View reviewed changes

ludfjig mentioned this pull request Jul 22, 2025

Restructure micro benchmarks #722

Open

	// Size of the (contioous) sandbox mem_regions
	// Size of the (contiguous) sandbox mem_regions

Use page tracking for snapshot and restore #683

Are you sure you want to change the base?

Use page tracking for snapshot and restore #683

Uh oh!

Conversation

simongdavies commented Jul 1, 2025 • edited by ludfjig Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Host Shared Memory Dirty Page tracking

VM Dirty page tracking

Snapshot Management

Benchmarking Enhancements

Performance Changes

KVM Before

KVM After

mshv2 Before

mshv2 After

mshv3 Before

mshv3 After

Windows 2025 Before

Windows 2025 After

Uh oh!

This comment was marked as outdated.

Uh oh!

jprendes left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ludfjig Jul 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ludfjig Jul 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jprendes left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jprendes left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jprendes Jul 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

simongdavies commented Jul 1, 2025 •

edited by ludfjig

Loading

ludfjig Jul 17, 2025 •

edited

Loading

ludfjig Jul 22, 2025 •

edited

Loading

jprendes Jul 18, 2025 •

edited

Loading

syntactically Jul 23, 2025 •

edited

Loading

syntactically Jul 23, 2025 •

edited

Loading

ludfjig Jul 22, 2025 •

edited

Loading